Rank | Count | Beginning |
---|---|---|
27811 | 2083 | Kana |
6302 | 1782 | Asi |
89626 | 1756 | Vanoti |
59615 | 1492 | Ndiri |
25433 | 1371 | Izvi |
4178 | 1086 | Anoti |
72923 | 946 | Saka |
22046 | 812 | Ini |
86017 | 777 | Vamwe |
51175 | 733 | Mutongi |
32726 | 687 | Kune |
37720 | 635 | Mai |
87865 | 549 | Vanhu |
48954 | 490 | Murume |
65487 | 488 | Nyaya |
22047 | 471 | “Ini |
36040 | 389 | Kwayedza |
95176 | 372 | Zvakadaro, |
46711 | 367 | Mumwe |
87087 | 361 | Vana |
99281 | 359 | Zvisinei, |
52725 | 358 | Mwana |
40404 | 356 | Mashoko |
63990 | 353 | Nhare |
44214 | 352 | Mudzimai |
97805 | 350 | Zvino |
15010 | 290 | Dr |
68883 | 288 | Pane |
1915 | 284 | Amai |
65492 | 280 | “Nyaya |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV